Statistics - Paired t-Test
Table of Contents
This article explains the Paired Sample t-Test used in statistics.
We will also proceed with the Paired Sample t-Test using the Python Scipy library.
Paired Sample t-Test #
The Paired Sample t-Test is a statistical technique for comparing the means of two related groups. This method is typically applied when there are two measurements for the same group of subjects. The paired sample t-test is used to determine whether the difference in means between two related groups is statistically significant.
1. Hypothesis Setting #
H₀ : 𝜇D = 0 → Null Hypothesis (𝜇𝐷 = 𝜇₁ - 𝜇₂) | The difference in means before and after the experiment is 0. |
---|---|
H₁ : 𝜇D ≠ 0 → Alternative Hypothesis | The difference in means before and after the experiment is not 0. |
2. Normality Test #
If the sample size of the two groups is less than 30, a normality test must be conducted.
If the sample size of the two groups is 30 or more, it is assumed that normality is satisfied due to the Central Limit Theorem.
- In Scipy, normality testing can be confirmed through the Shapiro-Wilk test.
4. Calculation of Paired Sample t-Statistic #
The paired sample t-statistic is calculated using the means and standard deviations of the two groups.
5. Decision/Conclusion #
If the calculated t-statistic exceeds the critical value, the null hypothesis is rejected and the alternative hypothesis is accepted.
Otherwise, the null hypothesis is not rejected.
If there is a statistically significant difference, it is concluded that there is a difference in means between the two groups.
Using Python Library Scipy #
Below is how to proceed with the Paired Sample t-Test using the Python Scipy library.
The data we are dealing with includes, in student A’s class, there was a rumor that working out improves concentration, so A decided to compare before and after working out. A made 20 people work out, then had them take concentration measurement tests before and after the training.
We want to see if there is a significant difference in concentration before and after working out through a Paired Sample t-Test.
The hypothesis is as follows:
Null Hypothesis : The test averages before and after working out are the same.
Alternative Hypothesis : The test averages before and after working out are not the same.
The significance level is set at 0.05.
First, let’s load the data.
>>> import pandas as pd
>>> from scipy import stats
>>> df = pd.read_csv("./data/ch11_training_rel.csv")
>>> df.head()
전 | 후 | |
---|---|---|
0 | 59 | 41 |
1 | 52 | 63 |
2 | 55 | 68 |
3 | 61 | 59 |
4 | 59 | 84 |
Next, let’s conduct a normality test.
>>> a = stats.shapiro(df['Before'])
>>> b = stats.shapiro(df['After'])
>>> print(a, b)
ShapiroResult(statistic=0.9670045375823975, pvalue=0.690794825553894)
ShapiroResult(statistic=0.9786625504493713, pvalue=0.9156817197799683)
Both results have a p-value greater than 0.05, indicating normality is satisfied.
Next, we can calculate the t-statistic and p-value using ttest_rel in the Scipy library.
>>> t_score, p_value = stats.ttest_rel(df['Before'], df['After'])
>>> print(round(t_score, 4), round(p_value, 2))
-2.2042 0.04
Since the p-value is less than the significance level of 0.05, the null hypothesis (the averages before and after working out are the same) is rejected. Therefore, we can conclude that there is a significant difference in the average scores before and after working out.